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Abstract. This paper proposes a new method to combine several densities 
such that each density dominates a separate part of a joint distribution. The 
method is fully unsupervised, i.e. the parameters in the densities and the 
thresholds are simultaneously estimated. The approach uses cdf functions in 
the mixing. This makes it easy to estimate parameters and the resulting den- 
sity is smooth. Our method may be used both when the tails are heavier 
and lighter than the rest of the distribution. The presented model is com- 
pared with other published models and a very simple model using a univariate 
transformation. 

Mixing functions; Heavy and Light Tailed Distributions; Maximum Likelihood; 
Mixture Models. 

1. Introduction 

In many applications, the tail of a probability distribution is of particular in- 
terest, e.g. prediction of floods or estimation of financial reserves in insurance. 
Because extreme data are rare, it is difficult to fit tail models and to support 
parametric model choices convincingly. Most papers study this problem in one di- 
mension assuming a heavy tail. The approach presented in this paper mix different 
distributions that describes different parts of a new joint distribution. The joint 
distribution may have heavier or lighter tail(s) compared to the tail(s) of the dis- 
tribution that is used in the central part of the joint distribution. Moreover, it is 
possible to generalise to i?". 

Common practice in extreme value modelling is to fix a threshold u and to fit a 
distribution, e.g. a generalized Pareto Distribution (GPD), to the data exceeding 
u. There is a number of methods to estimate the parameters once u is fixed, see 
for instance (TB], [Z] and references therein. As is well known, the estimates depend 
significantly on the choice of the threshold, see for instance [ID], Figure 6.2.8. In 
order to reduce model bias, the threshold u should be chosen large, but this often 
leaves very few data points for the estimation of the parameters. Hence, the re- 
sulting parameter estimates will have large variances. Moreover, the selection of an 
appropriate threshold is a difficult task in practice, see for instance [8], [17], [TO] . 
and [2]. Often a supervised analysis is performed, selectively and off-line as part 
of a monitoring scheme. For practitioners, who usually need to perform their data 
analyses regularly, it would be convenient to have automatic and robust approaches 
that do not require an a priori tuning of a threshold. Such an unsupervised ap- 
proach to tail estimation would be of particular relevance in automatic real-time 
monitoring of financial, industrial and environmental quantities, for instance for 
warning purposes. 

Recently, and have proposed two new ways of addressing this question. 
The paper [3] suggests a robust model validation mechanism to guide the threshold 
selection. The procedure assigns weights between zero and one to each data point, 
where a high weight means that the point should be retained since a GPD model 
is fitting it well. The author suggests to start with a low threshold u and increase 
it, thus reducing the number of data points, until all data left have weights close to 
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1. This is a promising method, but thresholding is still needed at the level of the 
weights. 

The paper was the first to suggest a fully unsupervised approach to tail 
estimation. Their approach has three key ingredients. First, the model consists 
of two components, one representing the central part of the distribution and the 
other the tail. Second, all data are modelled in one mixture model, and finally, the 
parameters in the two distributions and the mixing parameters are simultaneously 
estimated. 

Our method is based on the same ideas as those in [llj . but we mix the cumu- 
lative distribution functions (cdfs) instead of densities. This makes our approach 
computationally more efhcient, which in turn makes it easier to generalise to higher 
dimensions. Threshold estimation is more complicated in higher dimensions, since 
the threshold is a surface. However, in our approach, this is handled efficiently. 
This paper only shows examples in one dimension. The multivariate examples are 
reported separately, [H]. We also show how to use several different distributions 
for different parts of the tail(s). Also the method presented in [31 uses cdfs in 
the mixing. But our method gives a continuous density in contrast to the method 
presented in [3]. 

The proposed model is compared with a model based on a univariate transforma- 
tion. The properties of the two models are quite similar. But when generalizing to 
several dimensions there are important differences. The model based on cdfs may 
combine different multivariate densities, but needs to calculate the cdfs and not 
only the densities in the evaluation. The model based on univariate transformation 
does not need to calculate the multivariate cdfs, but the properties are dominated 
by the properties of the chosen multivariate distribution. If we want to change the 
multivariate properties, the method may be combined with a copula approach. 

In many applications it is needed to describe the entire distribution, not only 
the tail(s). In our methods, the user selects densities that he/she believes fits the 
different parts of the data. However, in some cases the density that is used for 
describing the tail also describes the central part of the distribution better than the 
density that is supposed to describe the central part of the density. Then the tail 
density may end up modelling most of the density leading to better overall match 
with data, but with poorer description of the tail. This is easy to identify from the 
estimated threshold and may for example be corrected by putting a prior on the 
threshold. 

We first describe the cdf-model in Section[2]and the transformed model in Section 
[31 Section|3]compares our approach to other models. Then, in Section[51 the models 
are tested with synthetic data and financial data. Finally, Section [S] contains some 
concluding remarks. 



In this section we start with two one-dimensional components in the mixture, 
and then we show how the model may be generalised to several components. 

2.1. Two components. Let x ^ R and let G{x; 6g) and F{x\ Op) be two cdfs that 
we want to combine to a cdf denoted L. Define a threshold u and a mixing zone 
(u — £,u + e) for e > 0, and let the cdfs G and F determine the properties of L 
below and above the mixing zone, respectively. Further, let both cdfs influence L 
in the mixing zone. We will often mix truncated distributions. Hence, we define 
the truncated densities where 
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and 

The corresponding truncated cdfs are defined as 

Gtix-ea)^ [ 9t[t:eG)dt 

J —oo 

and 



Ftix;9F)^ I Ft{t;0F)dt 

J —oo 



where we do not require that Gf(oo;0G) = 1 and Ft(oo;9F) — 1- We then define 
the mixed cdf by 

(1) L{x; Ol) = k{GMx; e,);0G) + Ft{p{x; ep);0F)) 

where k is defined such that L(oo; 9^) — 1 and q{x; 9q) and p{x] 9q) are two mono- 
tone increasing mixing functions described below. Note that k is easily found by 
K = l/(G't(cxD; ^g) + Ft{oo]9F))- The parameters of L{x;9l) include all the other 
parameters i.e. 9l — {SG,(^F,dq,dp)- Equation ([T]) is a well-defined cdf when the 
truncated cdfs Gt and Ft and the mixing functions q and p satisfy the criteria 
specified below. The corresponding density 1{x;9l) is given by 



(2) 1{x;9l) 



Kg{x]9G) \ix<u — e 
K{g{q{x;9q)]9G)q'{x]9q) + 

f{p{x;9p);9F)p'{x;9p)) if .x e {u-e,u + e) 

Hif{x]9F) if a; > It + e 



This requires that q{x] 9q) = p{x; 9q) — x where it is applied for x outside the mixing 
zone. The mixing functions q and p determine how G and F influence L in the 
mixing zone. They are monotonously increasing functions defined on R and with 
range equal to R. If we set e — Q and the two mixing functions equal to the identity 
function i.e. q{x] 9q) — p(x] 9q) = x, then we get the standard approach where only 
data above the threshold is used in the tail estimation and the joint distribution 
is discontinuous. In our approach, we want all data to be used in the estimation 
of a continuous density 1{x;9l)- Then we set e > and the function q maps the 
interval (— oo, u + s) onto (— oo, u) and p maps the interval (u — e, oo) onto (u, oo) 
as illustrated in Figure[T] In order to get the derivative of l{x; 9l) to be continuous, 
we need to have q'{u + e; 9q) ~ q"[u + e\ 9q) = and p'{u — s; 9q) = p'^u — e: 9q) ~ 0. 
Further properties of the mixing function is discussed in Section We have found 
that the two mixing functions p{x] 9p) and q{x] 9q) defined below work well. Define 

x X < u — e 

(3) q{x;9q)={ l{x + u - e) + ^ cos{^^^) u-e<x<u + e 

X — e u + e < X, 



X + e X < u — e 

(4) p{x;9p)^^ i(2; + u + e) - |cos(^^^^) u-e <x <u + e , 

X u + e < X. 

where 9q — 9p — {u,e). Figure [2] shows an example where two Gaussian densities 
are mixed. 
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Figure 1. The mixing functions p{x; 6p) (dotted line) and q{x; 6q) 
(solid line) for m = —1 and e — 0.5. The vertical bars correspond 
to u — £, u and u + e, respectively. 




Figure 2. The two normal densities g{x;9G) ^ ^(0,1) and 
f{x]6F) ^ ^(0,4) are mixed with the mixing zone (0.4,2.4) The 
resulting mixture density l{x] Ol) is given by the solid black line. 

2.2. Several components. Equation ([T]) may easily be generalised to a mixture of 
several truncated cdfs Gi , . . . , Gk with parameters 9gi , . • . , ■ Define a threshold 
Ui and a mixing zone (u^-i — £i-i, Ui + Si), where > and the truncated density 
gi{x; 9g) > only if Wi^i < x < Ui for each component i. We assume that uq — — oo 
and Uk = oo. The resulting cdf is given by 

k 

(5) L{x;eL) = KY,G,{q,{x-dq^)-eG,), 

1=1 

where k is defined such that i(oo; 9l) = 1. Let Gi{ui^i; Og^) — and 

G,(x;0gJ = / 9^{t■,0G)dt 
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for i = 1, • • • ,k. 

The density 1{x;9l) corresponding to the cdf L{x;9l) in Equation (O is given 

by 
(6) 

{Kgi{x;9G,) if a; < - £i 

K{g,{q,{x;9g^);9G,)qi{x;9qJ+ 
g,+i{q,+i{x;9q^^,);9G,+i)q[+i{x;9g.^J) if x > u, - e^ 

assuming we have x £ {ui-i + £i-i,Ui + Ei) for a value of i. The first expression 
denotes the density between two consecutive mixing zones, and the other within a 
mixing zone. 

Each mixing function g^, with parameters 9q^ — (wi-i, e^-i, m^, e,), maps the in- 
terval (iti_i — ei_i , Ui+£i) onto (iti_i , Mi). The mixing functions must be continuous 
and monotonously increasing. Moreover, in order to ensure a smooth transition be- 
tween the densities, we should have 

(7) q'Xx-9q^) + q[+^{x-9q^^,) = l, 

in the mixing zone, and each g^, i = 1, • • • , fc should satisfy 

(8) - £i_i;6',,) = 0, + £j_i; 6*,,) = 1, 

(9) q[{ui - ei\9q.) ^ I and q'-{ui + ei;9q.) = Q. 

We avoid breakpoints in the density corresponding to the cdf L in Equation ^ by 
also requiring 

(10) gfK-i-e»-i;^9J = 0, gfK-i+e^-i;^?.) = 0, 

(11) q'l{u,~e,-9q,)=Q and (w, + £,; = 0. 

One of the major problems in extreme value theory is to estimate the threshold 
u. We reduce this problem by defining the threshold Ui from the equation 

(12) 5i(u^;6'G.) = 5i-(-i(wj;6'G,+i)- 

If there are several values of ui that satisfies the equation, we may select the supre- 
mum or infinum of these values. Equation p2p ensures that there are not large 
changes in /(x; 9^) in the mixing zones. Letting the threshold be a function of the 
other parameters instead of a separate parameter, reduces the number of parame- 
ters. In Section [STT] we also show that this makes the estimation of the parameters 
in the model more stable. Equations ([S]) - Q ensure that 1{x;9l) has continu- 
ous derivative without requiring Equation (fT^ . If gi,gi+i, ■ ■ ■ ,gk have heavier and 
heavier tails or lighter and lighter tails. Equation (fT2|) is particularly natural. If 
the tails are heavier, then k is slightly less than 1 and if the tails are lighter, then 
K is slightly larger than 1. 

There are several possible definitions for qi that satisfy the requirements given 
in Equations (H)-©. We use 
(13) 

X + Si-i X < Ui^i - Ei-i 

2(2: ^ U,-l + e,-l) COS( 26^-i > ~ ^^''^ <X< Ui^l + 

qi{x]9q.)^'{ X Ui^l + Ei-l < X < Ui - 



^{x + Ui - Si) + ^ COs C^^E."''' ) Ui - Ei < X < Ui + Ei 

X Ei 

Figure [3] shows the gi-function. 



2£i 

Ui + Ei < X. 
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Figure 3. The transition function qi{x; 9q.) with Ui-i = —1, Ui = 
1, and Ei-i = Ei = 0.5. The thresholds are given by vertical solid 
lines and the mixing zones are delimited by the vertical dotted 
lines. Note that the transition function maps the interval — 
ei-i,Ui onto {ui^i,Ui). 



3. The transformation model 

An alternative to the model described in the previous section is to transform 
data to a known density like what is done in a normal score transform. We will 
present a method of this type where we focus on the tail behaviour and make it 
quite similar to the method presented in the previous section. Since we focus on the 
tails where there are few data points, we use a parametric transformation instead 
of an empiric transformation. The authors are well aware that the main argument 
for this model is that it is mathematically convenient and not that it is based on 
classical statistical principles. There are many similarities between this approach 
and the method described in the previous section, and the results are as good as 
for the other method. 

Let X ^ R and let G{x\ 9g) be a cdf where we want to change the tail behaviour. 
Let q{x] 9q) be a monotone increasing function and define the new cdf by the func- 
tion 

(14) L{x-eL)^G{q{x-eq)-eG) 

which is a valid cdf. The density is obviously 

(15) i{x-eL) = 9{q{x;eq)-eG)q'{x-eq) 

where g and q' denote the derivative of G and q respectively. We get heavier tail 
if \q{x]6q)\ < \x\ and lighter tail if \q{x;dq)\ > \x\ for x in the tail of g. There is a 
large variety of alternatives for the function q. Using the same notation as in the 
previous section we define a mixing zone {u ~ e,u + e) where we let q{x) = a; in 
the central part of the distribution and q{x) = f{x) on the tail side (outside) of 
the mixing zone. We have found that f{x) = u{x/u)^ gives good results. If we 
let G be the normal distribution, we see from Equation (fT5| that 1{x,0l) get the 
asymptotic behaviour 

\u\^-^\xf-^PeM-\u?'^^\x?^). 
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We want q'{x] 9q) continuously difFerentiable in order to get l{x] 0l) continuously 
differentiable. Therefore, we propose the following function 

X X < u — e 

(16) + u-e<x<u + s 

u{x/u)^ u-\- e < X 

where £ is a fixed constant determining the length of the transition zone and c, d, ki 
and fc2 are chosen in order to get l{x] Ol) smooth. We have the following equations 

f[u + e)-l^^,f"{u + e) 



(17) 



(18) 



f"{u + e)-d{2ef 



{2e) 



ki-2 



in order to get q{x] 9g) twice continuously differentiable where the constants satisfy 
fci > 3 and ^2 > 2. The function q is smooth with fci = 4 and ^2 = 3. See 
Figure 13] for an illustration of a q{x;Oq) function and the corresponding density. 
The parameters in the model, 0g,/3 and u should be found from data. The length 
of the mixing zone should be set as a constant or connected to the variance of 
G{x) since it is difficult to estimate this from data. Similarly to combine several 
components in we may have several mixing zones in (|16p . 




Figure 4. The density in the transformed normal model with 
g{x; Oq) ^ N{Q, 1). The figure also shows q{x, Oq)/lQ with mixing 
zone (0.3, 1.3) and q{x, 0q) = (x/u)°-^ 



4. Comparison with other models 

The traditional method in extreme value modelling is to fix one or two thresh- 
olds, and use only values further out in the tails than these thresholds for parameter 
estimation. By using Equation ^ with three components, fixing the Ui's in ad- 
vance, and letting Si — for all components, the cdf-method proposed is identical 
with the traditional one. 
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(20) L{x;eL) 



Equation ^ bears resemblance with the mixed model 

(19) l2{x;ei) = -L-{p{x;0p)g{x;eG) + (1 - 0p))/(x; 0f)), 

proposed by [TT]. Here / and g are the densities of F and G respectively, and 
Z{9i) is an integrational constant. The integrational constant is generally found 
by numerical integration, which is likely to make the maximum likelihood estima- 
tion unstable and computationally expensive. By mixing the cdfs instead of the 
densities, we often get analytic expressions for the integrational constant, and the 
parameter estimation becomes more stable. Otherwise, it makes little difference 
whether the mixing is based on the densities or the cdfs. However, the increased 
efficiency of our model as compared to Equation p9p makes it more manageable to 
use in several dimensions. 

In [3] it is proposed to use the cdf 

G{x; 9q) X < u 

G{u;0G) + {l-Giu;0G))F{x;0F) x>u 

This is identical with the mixing model ([T]) if we assume there is no mixing zone, 
i.e. s — and f{x) is replaced with cf{x) for a constant c such that k = 1. 
By introducing a mixing zone we obtain a continuous density. As shown in the 
example, this does not imply an increase in the number of parameters that need 
to be estimated. It only makes the result more plausible and offers more stable 
estimation of the parameters since the density is smooth. 

In the recent preprint 'P another model of the same type is proposed in the 
context of neural networks. It is proposed to mix a normal distribution with a 
GPD distribution with the restriction on the parameters such that the density and 
the derivative of the density are the same on both sides of the thresholds. This 
gives a smooth density without a mixing zone. Their model has one parameter 
less than the GPD-normal model presented in this paper since the requirement of 
a continuous derivative of the density eliminates the scaling parameter in the GPD 
density. This implies that variance of the normal distribution is connected to the 
scaling of the tail in the mixed model. It is neither easy to generalize their model 
to other densities than GPD nor to several dimensions. 



5. Numerical applications 

We illustrate the proposed models on both synthetic data and real financial 
data. All parameter estimation is performed by maximizing the log-likelihood. 
The maximisation is done numerically using the routine nlminb in R. This seems 
to work very well in all tests performed. 

5.1. Synthetic data. In this section, we test three different models. We generate 
synthetic data from one of the models and then estimate parameters and quantiles 
in all the proposed models. The first two models are mixtures of the form defined 
by Equation (O with fc = 3 and the normal distribution as the central distribution. 
The first model has the generalized Pareto Distribution (GPD) distribution in both 
tails and the second has the WeibuU distribution in both tails. The GPD cdf is 

G(x;e,a)-l-(l + i^)-i 

assuming ^ > 0, ct > and x > 0, and the cdf in the WeibuU distribution is 

G(x;/3,A) = l-cxp{-{xXf) 

for /3 > 0, A > and x > 0. This gives 10 parameters in the model described by 
Equation ([5]), 2 in each of the three distributions in addition to ui,U2,Si and £2- 
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Table 1. Average of estimated parameters and standard devia- 
tion of the estimates when simulated using a GPD-Normal-GPD 
distribution. 10m indicates the use of 10.000 data points in the 
sample. 



01 0-2 ^2 Oa ^5 Ul U2 

Simulation GPD-N-GPD 0.300 0.400 1.000 0.200 0.400 -2.166 2.415 

Est. GPD-N-GPD lOm 0.299 0.401 1.000 0.200 0.400 -2.165 2.417 

St.dev. GPD-N-GPD 10m 0.019 0.017 0.010 0.021 0.020 0.067 0.081 

Est. GPD-N-GPD 0.295 0.404 0.996 0.195 0.403 -2.131 2.390 

St. dev. GPD-N-GPD 0.068 0.058 0.035 0.063 0.061 0.260 0.311 

Est. WeibuU-N-WeibuU 0.511 0.211 1.000 0.606 0.253 -2.391 2.513 

St.dev. WeibuU-N-Weibull 0.058 0.061 0.031 0.061 0.059 0.170 0.315 

Est. transf. N - 0.410 1.060 - 0.489 -1.755 1.914 

St.dev. transf. N - 0.069 0.031 - 0.104 0.190 0.251 



The thresholds ui and U2 are determined from Equation p2p . This reduces the 
number of parameters and also gives smoother distributions. The length of the 
transition intervals, 2£i, are not critical in the estimation and not easy to estimate. 
Hence, we set = cr2 for i = 1 and i = 2 where 02 is the standard deviation of the 
central normal distribution. In all the tests we set the expectation in the central 
density equal to leaving 5 unknown parameters. 

The third model we test is the transformation model described in Equation (jl4p 
with the polynomial function for q given in Equation (|16p . Also here we set Ei = tT2 
for i = 1 and i = 2 in order to make the same choice as in the previous model. 
We denote a2 as the standard deviation of the normal distribution in order to use 
the same symbol with the corresponding parameter in the other models. Also here 
there are 5 unknown parameters. 

In the tests we simulate m — 1000 samples from each of the models in turn 
and then estimate parameters in all the models. In addition, we also test with 
10m = 10.000 samples with the same model as is used in the simulation. This is 
repeated k — 500 where we estimate the parameters in each of the three models, the 
corresponding 0.001,0.01,0.99 and 0.999 quantiles, the difference in Li norm, and 
the log-likelihood value. The tables give the average and the standard deviation of 
the estimated parameters/quantiles/values. The difference in Li norm is estimated 
by dividing the state space into 100 intervals. The difference in Li norm is half 
the sum of the absolute value of the difference in probability between the estimated 
and the original density in each interval. Simulation from a distribution where the 
density differs in the Li norm by 0.01 compared to the correct density, implies that 
0.01 of the samples are from a wrong distribution. 

The parameters are estimated by maximizing the log-likelihood. The model with 
GPD distributions in the tails is left with the following 5 parameters: ^1, (Ti, (T2, ^3, 
CTs and the model with WeibuU distribution in the tails has the parameters /?i , Ai , (T2 , 
/33, A3. In these two models the thresholds ui and U2 are found from Equation 
based on the other parameters. The model with transformations has the parameters 
Ml, M2, cr2, /Ss- In Tables [U [31 and[51 the parameters are denoted 61,62, 02, 63,64, 
Ul, and U2 where 61,62,63,64, have a different interpretation in the different models. 
The results of the simulations are shown in Tables [T]-[6l We see that there are quite 
good estimates for all the parameters. The standard deviation is comparable with 
estimation of a in the normal distribution with the same sample size. Only 1- 
4% of the m — 1000 samples are from the tails and in the mixing zone these are 
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Table 2. Quantiles, Li-error and log- likelihood using GPD-N-GPD. 







90.001 


90.01 


90.99 


90.999 




logliken. 


Simulation GPD-N-GPD 


-9.15 


-3.92 


3.00 


5.91 




-1583.5 


Est. GPD-N-GPD 10m 


-9.14 


-3.92 


3.00 


5.91 


0.005 


-15833.0 


St.dcv. GPD-N-GPD 10;// 


0..-.4 


0.12 


0.08 


0.31 


().()()2 


99.5 


Est. GPD-N-GPD 


-9.20 


-3.90 


2.99 


5.88 


0.016 


-1581.2 


St. dev. GPD-N-GPD 


1.96 


0.41 


0.24 


0.96 


0.007 


30.5 


Est. Weibull-N-WeibuU 


-9.34 


-4.12 


3.10 


6.15 


0.015 


-1583.6 


St.dcv. Weibull-N-WeibuU 


1.70 


0.49 


0.28 


0.85 


0.008 


33.2 


Est. transf. N 


-8.41 


-4.08 


3.22 


5.95 


0.025 


-1583.8 


St.dev. transf. N 


1.66 


0.50 


0.32 


1.01 


0.006 


33.4 



Table 3. Average of estimated parameters and standard devi- 
ation of the estimates when simulated using a WeibuU-Normal- 
Weibull distribution. 











0-2 


0A 


^5 




U2 


Simulation Weibull-N-Weibull 


0.500 


0.200 


1.000 


0.600 


0.250 


-2.394 


2.487 


Est. Weibull-N-Weibull 10m 


0.502 


0.202 


0.999 


0.600 


0.251 


-2.388 


2.483 


St.dev. Weibull-N-Weibull 10m 


0.019 


0.021 


0.010 


0.021 


0.021 


0.052 


0.063 


Est. Weibull-N-Weibull 


0.511 


0.211 


1.000 


0.606 


0.253 


-2.390 


2.514 


St.dev. Weibull-N-Weibull 


0.058 


0.061 


0.031 


0.061 


0.059 


0.170 


0.315 


Est. GPD-N-GPD 


0.383 


0.331 


1.000 


0.333 


0.298 


-2.314 


2.590 


St.dev. GPD-N-GPD 


0.091 


0.071 


0.035 


0.125 


0.111 


0.265 


0.440 


Est. transf. N 




0.410 


1.060 




0.489 


1.755 


1.914 


St.dev. transf. N 




0.069 


0.031 




0.104 


0.190 


0.251 



Table 4. Quantiles, Li-error and log-likelihood using WeibuU- 
Nor mal- WeibuU . 







90.001 


90.01 


90.99 


90.999 


Li 


loglikeh. 


Simulation Weibull-N-Weibull 


-9.45 


-4.18 


3.12 


6.21 




-1585.7 


Est. WcibuU-N-WcibuU 10m 


-9.45 


-4.18 


3.13 


6.22 


0.005 


- 15874.9 


St.dev. Weibull-N-Weibull 10m 


0.51 


0.14 


0.08 


0.28 


0.002 


99.4 


Est. Weibull-N-Weibull 


-9.34 


-4.12 


3.10 


6.15 


0.015 


-1583.6 


St.dev. Weibull-N-Weibull 


1.70 


0.49 


0.278 


0.848 


0.008 


33.2 


Est. GPD-N-GPD 


-11.41 


-4.08 


3.11 


7.82 


0.017 


-1584.7 


St.(kn-. GPD-N-GPD 


3.02 


0.51 


0.34 


2.18 


0.008 


33.4 


Est. transf. N 


-8.41 


-4.08 


3.22 


5.95 


0.025 


-1583.8 


St.dev. transf. N 


1.66 


0.50 


0.32 


1.01 


0.006 


33.4 



mixed with data points from the central distribution. The standard deviations 
for the different parameters have about the same size including (T2, the standard 
deviation in the central distribution. The estimates for the thresholds Ui have a 
larger standard deviation than the other parameters, indicating that the threshold 
should be determined implicitly by the other parameters. We have also tried to 
estimate these simultaneously with the other parameters. Then all parameters 
have larger uncertainty. Also the quantiles and the density measured in the Li 
error are quite close to the quantiles and density that were used in the simulation. 
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Table 5. Average of estimated parameters and standard devia- 
tion of the estimates when simulated using a transformed normal 
distribution. The transformed distribution uses Ui as one of the 
five estimated parameters (instead of 9i and 64) while Ui depend 
on the other parameters in the other models. 







O2 


CT2 


04 


05 


Ul 


U2 


Simulation transf. N 




0.450 


1.000 




0.600 


-1.500 


1.500 


Est. transf. N 10m 




0.450 


1.000 




0.600 


-1.498 


1.500 


St. dev. transf. N 10m 




0.015 


0.008 




0.021 


0.034 


0.052 


Est. transf. N 




0.458 


0.996 




0.605 


1.484 


1.489 


St. dev. transf. N 




0.056 


0.028 




0.075 


0.118 


0.198 


Est. GPD-N-GPD 


0.343 


0.360 


0.928 


0.241 


0.387 


-1.985 


2.001 


St.dev. GPD-N-GPD 


0.075 


0.056 


0.038 


0.116 


0.106 


0.221 


0.467 


Est. WeibuU-N-WeibuU 


0.554 


0.256 


0.926 


0.685 


0.339 


-2.074 


2.028 


St.dev. WeibuU-N-WeibuU 


0.055 


0.059 


0.033 


0.078 


0.080 


0.159 


0.298 



Table 6. Quantiles, Li-error and log-likelihood using a trans- 
formed normal distribution. 





90.001 


90.01 


90.99 


90.999 


Li 


loglikch. 


Simulation transf. N 


-7.51 


-4.00 


3.13 


5.02 




-1548.2 


Est. transf. N 10m 


-7.51 


-4.00 


3.13 


5.02 


0.005 


-15483.9 


St.dev. transf. N 10m 


0.33 


0.11 


0.07 


0.17 


0.002 


95.8 


Est. transf. N 


-7.52 


-3.99 


3.11 


5.02 


0.018 


-1545.7 


St.dev. transf. N 


1.17 


0.38 


0.20 


0.54 


0.008 


31.1 


Est. GPD-N-GPD 


-10.2 


-3.94 


3.12 


6.79 


0.028 


- 1550.2 


St.dev. GPD-N-GPD 


2.31 


0.42 


0.29 


1.76 


0.007 


31.5 


Est. WeibuU-N-WeibuU 


-8.44 


-3.98 


3.10 


5.71 


0.025 


1548.0 


St.dev. WeibuU-N-WeibuU 


1.27 


0.40 


0.24 


0.63 


0.006 


31.3 



The uncertainty in the 90.001 a-^d 90.999 quantiles in the GPD distribution is larger 
than for the other distributions since this has heavier tails than the two other 
distributions. As expected, we always get better estimates when we fit the same 
model as is used in the simulation and when we increase the number of samples to 
10m = 10.000. 

5.2. Financial data. We want to illustrate the use of the models on real data 
and have selected three stock market indices; the European, the American and 
the Japanese. It is not our ambition to suggest the best possible model for these 
data. That would require a more complex model including for example handling of 
stochastic volatility which is outside the topic of this paper. We study each of the 
stock markets independently using the methods from Sections [2] and [H 

We assume that the three stock markets can be represented by the corresponding 
Morgan Stanley (MSCI) price indices in local currency neglecting the currency risk 
in the portfolio. We use index data from the period 01.01.1987 to 28.05.2002 for 
model estimation. This period corresponds to m = 4065 observations. The return 
series are shown in Figure[5l Let Xi^t denote the original indices, z = 1, 2, 3. We use 
the data r^^f = log{xi+i^t/ Xi^t) — Mi where is determined such that r.i^t — 0. 

Figureinishows normal QQ-plots for the standardised logarithmic residuals r^^t /cji 
for each of the three markets where ai is the standard deviation of r^^f. As can be 



L. HOLDEN AND O. HAUG 



Europe 




USA 




Japan 




Figure 5. European, American and Japanese geometric return 
series for the period 01.01.1987 - 28.05.2002. 
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Figure 6. QQ-plots of the standardized residuals fitted against 
the normal distribution. 
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Table 7. Parameter estimates for the GPD-N-GPD mixture 
model using residuals. We have Ei = 02- 



Parameter 


Europe 


USA 


Japan 




0.266 


0.156 


0.432 


0-1 


0.00395 


0.00564 


0.00831 


0-2 


0.00409 


0.00320 


0.00377 




0.0735 


0.198 


0.0724 




0.00498 


0.00680 


0.00801 


log-likelihood 


13609 


13227 


12339 



Table 8. Parameter estimates for the WeibuU-N-WeibuU mixture 
model using residuals. We have Ei = <72- 



Parameter 


Europe 


USA 


Japan 


/?i 


0.686 


0.801 


0.987 


Ai 


0.00357 


0.00546 


0.00873 


02 


0.00454 


0.00344 


0.00368 




0.885 


0.977 


0.948 


A3 


0.00464 


0.00664 


0.00845 


log-likelihood 


13612 


13223 


12334 



seen from the figure, all distributions are doubly heavy-tailed. Moreover, they are 
clearly skewed, having one tail heavier than the other. This motivates for the 
use of a mixture distribution with three components, one for the left tail, one for 
the centre of the distribution, and one for the right tail, respectively. Hence, we 
use the distribution given in Equation ([5]) with three components. Exactly as in 
Section 15.11 we test with the GPD and the WeibuU density in both tails and with 
the normal distribution in the centre. In addition, we test with the transformed 
normal distribution given in Equation ()14p . 

In all cases we have the same 5 parameters as described in Section 15.11 that are 
estimated by maximizing the likelihood 

ni 

(21) ^l{r^,t:QL). 

t=\ 

where l{r;9L) is given by Equations (O and respectively. The results are 
shown for each of the models in Tables [7] - [D For the transformed normal model, 
three of the threshold values ended up equal to the limit ±0.005. This indicates 
that we get best fit with using the transformation for all negative/positive values. 
The parameter (T2 is then only used to set the density for x = and influences 
the joint density in the mixing zones (—0.01, 0) and (0, 0.01). We get best fit using 
GPD-N-GPD, then WeibuU-N- WeibuU and then transformed normal density. The 
estimated quantiles are given in Table [Tl] together with the corresponding results 
from the multivariate distributions. 

5.2.1. Parameter estimates for NIG distribution. We have chosen to compare the 
results using the mixture model with the ones obtained using the Normal Inverse 
Gaussian (NIG) distribution. This distribution has been used for financial applica- 
tions, both as the conditional distribution of a GARCH-model, see [5], and as the 
unconditional return distribution, see [1] . The paper |19) compares different prob- 
ability distributions for the innovations in one-dimensional processes. The authors 
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Table 9. Parameter estimates for the transformed normal model 
using residuals. We have Si = 0.005. 



Parameter 


Europe 


USA 


Japan 


Ul 


-0.007 


-0.006 


- 0.005 


U2 


0.0085 


0.005 


0.005 


Pi 


0.550 


0.565 


0.665 


0-2 


0.00674 


0.00659 


0.00778 


P3 


0.583 


0.665 


0.655 


log-likelihood 


13591 


13188 


12300 



Table 10. Parameter estimates for NIG distributions for loga- 
rithmic residuals. 



Parameter 


Europe 


USA 


Japan 




0.00134 


0.000746 


-0.000308 


6 


0.00678 


0.00733 


0.00945 


a 


78.3 


67.2 


57.0 


P 


-12.6 


-3.85 


1.17 



consider a NIG distribution, a skewed Student's t-distribution and a non-parametric 
kernel approximation. They report that the NIG distribution provides the best fit 
overall for the models considered. 

The normal inverse Gaussian (NIG) distribution is a generalised hyperbolic dis- 
tribution with A = — ^. Its density is 

Sa exp (s^/a'^ — P^") Ki (a \J 6"^ + {x — /i)^ j exp (/3 {x — ^)) 

= ^ ^ , 

■n J P + {x — ^) 

where 5 > and < |/?| < a. In the above expression, Ki is the modified Bessel 
function of the third kind of order 1, see [T]. The parameters /i and 5 determine 
the location and scale, respectively, while a and P control the shape of the density. 
In particular, [3 = Q corresponds to a symmetric distribution. 

The parameters of the NIG distribution are estimated using the EM-algorithm 
described in [13 , with the moment estimates as starting values. The parameter 
estimates are shown in Tabic [TUl 

5.2.2. Comparing the models in the tails. We have used graphical logarithmic left 
and right hand tail tests to examine the fit in the tails. The graphical tests were 
performed as follows. Let (-'^(i), A'(jv)) denote the order statistic of the historical 
data, and F{x) the estimated cumulative distribution function of the fitted distribu- 
tion. For the NIG distribution this is calculated using the method described in [15] . 
A plot of log(F(Ar(t))) against superimposed on a plot of log {t/{N + 1)) against 
AT^t) shows the left tail fit for the fitted distribution, and a plot of log(l — 
against A'(t), superimposed on a plot of log {{N + 1 — t)/{N + 1)), the right tail fit. 

Figure[7|shows the plots. All the models give quite similar results but the mixture 
distribution with GPD tails followed by the NIG distribution seems slightly better 
than the others. 
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Left tail Europe Right tail Europe 




Figure 7. Plot of the tail behaviour in the five models. The circles 
correspond to the empirical data, the light-blue line to the mixture 
distribution with WeibuU tails, the black line to the mixture dis- 
tribution with GPD tail, the red to the NIG distribution and the 
blue line to the transformed normal. For reference, a normal fit is 
also included shown in green. 
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Table 11. Comparing the quantiles and likelihood between the 
different models and the data. The first four lines are the em- 
pirical data and then the models using GPD, WeibuU and trans- 
formation that assumes independence between the three variables. 
NOP is number of parameters, Qa^p the quantiles and loglik. the 
log-likelihood of the estimated variables. 



Model 


NOP 


p 




Qu,p 


QJ,P 


QA,p 


loglik. 


Data 




0.01 


-0.0293 


-0.0273 


-0.0345 


-0.0114 








0.99 


0.0235 


0.0272 


0.0359 


0.0110 








0.001 


-0.0645 


-0 0688 


-0 0585 


-0.0171 








0.999 


0.0504 


0.0506 


0.0721 


0.0285 




U.GPD-N-GPD 


3x5 


0.01 


-0.0302 


-0.0310 


-0.0357 


- 0.00580 


39178 






0.99 


0.0244 


0.0280 


0.0365 


0.00626 








0.001 


-0.0683 


-0.0600 


-0.0595 


-0.0113 








0.999 


0.0413 


0.0454 


0.0632 


-0.0115 




U. W-N-W 


3x5 


0.01 


-0.0299 


-0.0311 


-0.0349 


-0.000569 


39168 






0.99 


0.0241 


0.0276 


0.0358 


0.00511 








0.001 


-0.0558 


-0.0547 


-0.0557 


-0.0115 








0.999 


0.0391 


0.0439 


0.0582 


0.0117 




U. NT 


3x5 


0.01 


-0.0306 


-0.0325 


-0.0350 


-0.00875 


39079 






0.99 


0.0244 


0.0273 


0.0360 


0.00895 








0.001 


-0.0512 


-0.0535 


-0.0535 


-0.0105 








0.999 


0.0397 


0.0418 


0.0553 


0.0111 





6. Summary and conclusions 

In this paper we present a new method to mix densities from different models. 
The method is inspired by [11]. But we mix cdfs instead of densities since this 
is much more computationally stable and efficient making it possible to extend to 
several dimensions in contrast to the method described in [TT]. The paper [5] also 
combines cdfs but by introducing a mixing zone we obtain continuous densities. We 
also show how univariate transformations may be used in order to represent tail 
behaviour. 

The different models are tested by simulation from one model and then estimate 
parameters and quantiles from all the models. The suitability of each model depends 
on the data in each case. We compare the different models on a financial data set, 
evaluating likelihood and tail behaviours. The different models seem to behave 
quite similar. 

Before we select a model we should analyse the data and then find a model with 
suitable tail behaviour. In the multivariate case we need to find the univariate tail 
behaviour and the correlation in the tails of the data. 
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